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BLOCK-BASED, ADAPTIVE, LOSSLESS IMAGE CODER 
Field of the Invention 

This invention relates to selective compression of digital images. 
Background of the Invention 
5 Compression of digital images using lossless schemes is an integral 

part of a wide variety of applications that include medical imaging, remote 
sensing, printing, and computers* Recent advances in digital electronics 
and electromechanics have also helped employment of digital images 
widely. The algorithms for compression (or coding) of images have 
10 become sophisticated, spurred by the applications and standardization 
activities such as JPEG ("Digital Compression and Coding of Continuous 
Tone Images", ISO Document No. 10918-1). The lossy version of JPEG, 
introduced around 1990, gained an enormous following in the industry 
due to its simplicity, public domain software, efforts by the Independent 

15 JPEG Group (UPEG), and availability of inexpensive custom hardware 
(C-Cube Microsystems). The lossless counterpart did not gain significant 
acceptance, but provided momentum in diversified research activities. 

The primary approaches in lossless compression coding have used 
differential pulse code modulation (DPCM), followed by entropy coding 

20 of the residuals (W. Pennebaker and J. Mitchell, (JPEG Still Image 
Compression Standard , Van Nostrand Reinhold, New York, 1993). 
Recently, schemes that utilize transforms or wavelets have also been 
investigated and have gained acceptance (A. Zandi et al, "CREW: 
Compression with reversible embedded wavelets", Proc. of Data 

25 Compression Conference, March 1995, pp. 212-221; F. Sheng et al, 
"Lossy and lossless image compression using reversible integer wavelet 

transforms", Proc. LE.E.E.*^ 1998, ). However, the majority of the 

promising techniques have employed sophisticated DPCM and entropy 
coding techniques. These methods rely heavily on the statistical modeling 

30 of the data (source) (M. Weinberger et al, "On universal context modeling 
for lossless compression of gray scale images", LE.E.E. Trans, on Image 
Processing, 1996. Although such approaches have given excellent 
compression performance, they are cumbersome to implement and often 



inefficient as software programmable solutions implemented on digital 
signal processors (DSPs) or general purpose microprocessors. Efforts 
have been made to reduce the complexity of the statistical modeling 
portion in some of the best performing coders, CALIC (X. Wu et al, 
5 "Context-based, adaptive, lossless image coding", I.E.E.E. Trans, on 

Communications, vol. 45, 1997, pp. 437-444), and LOCO (M. Weinberger 
et al, "LOCO-1: A low complexity, context-based lossless image 
compression algorithm", Proc. of 1996 Data Compression Conference, 
1996, pp. 140-149). Even with such efforts, the computational complexity 

10 is daunting. One primary reason for this is a context switch that occurs on 
a pixel boimdary. This approach introduces several data dependent 
compute and control complexities in the encoder and the decoder. 

What is needed is an image compression approach that reduces the 
computational complexity but retains many of the attractive features of the 

15 most flexible compression approaches. Preferably, the approach should 
allow selective uses of lossless compression and lossy compression for 
different portions of the same image, without substantially increasing the 
complexity that is present when only lossless compression or only lossy 
compression is applied to an image. 

20 Summary of the Invention 

These needs are met by the invention, which provides a block-based 
coder that permits multiple levels of parallel implementation. The pixels in 
each input block are coded using a differential pulse code modulation 
(DPCM) scheme that uses one of several selectable predictors. The 

25 predictor for a block is chosen using local characteristics of the block to 
be coded. Prediction residuals (difference between actual and predicted 
values) are mapped to a non-negative integer scale and are coded using a 
new entropy-coded mechanism based on a modified Golomb Code (MGC). 
In addition, a novel run-length encoding scheme is used to encode specific 

30 pattems of zero runs. The invention permits parallel processing of data 
blocks and allows flexibility in ordering the blocks to be processed. 

A block of data values is examined to determine if the data values 
are all the same. A dc-only block uses a selected predictor and is easily 



compressed for later use. A non-dc-only block is examined according to 
selected criteria, and an optimal predictor is selected for this block. A 
residual value (actual value minus predicted value) is computed and 
clamped, and the block of clamped values and corresponding predictor 
5 index are processed for compression, using an efficient mapping that takes 
advantage of the full dynamic range of the clamped residual values. 

Context modeling can be included here without substantially 
increasing the computational complexity, by making the context switch 
granularity depend upon a "block" of pixels (e.g., PxQ), rather than on a 

10 single pixel, to allow inclusion of a transition region where a switch 
occurs^ In some imaging applications, combinations of lossless and lossy 
techniques are combined to compress an image. For example, a portion of 
the image corresponding to a majority of text information might have to 
be losslessly coded, while the portion of the image with continuous-tone 

15 gray-scale information can be coded with some visual distortion to obtain 
higher compression. In such apphcations, the input image is segmented to 
identify the regions to be losslessly coded. Accordingly, lossy coders and 
lossless coders are switched on and off region-by-region. However, many 
of the lossy and lossless coders may work on entire images. The 

20 "chunking" by the segmentation algorithm makes it inefficient to code 
small blocks using the existing methods. 

The approach disclosed here is applicable to mixed mode images 
that may contain graphics, text, natural images, etc. The context switch at 
the block levels can be adapted for lossy coding. Thus, one obtains a single 

25 coder format that fits both lossy and lossless cases and encompasses an 
image segmenter as weU. 
Brief Description of the Drawings 

Figures lA, IB, IC and 7 are schematic views of apparatus to 
practice the invention. 

30 Figures 2A, 2B and 2C are a flow chart of the inventive procedure. 

Figure 3 graphically illustrates zero-run statistics for an image. 
Figures 4, 5 and 6 illustrate use of parallelism with the invention. 



Description of the Invention 

An image can be represented as a rectangular array of PxQ blocks 
of pixels, each of which may contain text, graphics, natural images, etc. 
The rectangular image to be coded is split into a multiple of P x Q blocks 
5 of images, where P and Q may, but need not, coincide. Each block is first 
evaluated to determine if all the pixels in the block haye a selected value; 
this indicates a dc-only block. If all the pixel values in the block are 
identical, it is sufficient to code only one sample for that block. In 
addition, rather than encoding the raw pixel value, a prediction of the 

10 current sample value is made using previously coded adjacent saniple 
values, and the difference between the current sample value and the 
predicted value is encoded. This technique. Differential Pulse Code 
Modulation (DPCM), has been used in image and speech coding. 

Figure 1 A is a block diagram of apparatus that can be used to 

15 practice the invention. An image can be represented as a rectangular block 
of PxQ blocks of pixels, each block having P rows and Q columns. Each 
block may contain text, graphics, natural images, etc. The image to be 
coded is split into a plurality of PxQ blocks of images. Rastrered pixel 
values are received in one or more streams by a raster-to-block converter 

20 61 (optional) that converts rastered data to block data. A comparator 63 
interrogates each block of pixel value data and determines if the data in a 
given block are aU "dc"; that is, if all pixel values in the given block have 
the same value. If the comparator answers "yes'\ a differential encoder 65 
encodes dc differential data for that dc block(e.g., aU values constant) arid 

25 sends the compressed dc block data to a compressed data module 79 that 
accumulates compressed data for each block. 

For non-dc blocks, where the answer to the query of the comparator 
63 is "no", the optimal prediction scheme has the potential to vary block- 
by-block, or even pixel-by -pixel. A prediction scheme that is optimal for 

30 the current input block is chosen, block-by-block, from among several 
predictors. The different predictor schemes may be statically or 
dynamically chosen. A predictor selection module 67 receives the input 
block of pixel value differences, selects the optimal predictor for the 



current input block, and provides this information for a predictor module 
69 and for the compressed data module 79. 

Where static predictor selection is implemented, a set of predictors 
is selected to be used in a certain image coding system. For a given block, 
5 the predictor that is optimal for this block is chosen, and the information 
on which this selection is based is sent to a decoder as part of a coded 
bitstream. In a dynamic selection scheme, an optimal predictor for a given 
block can be found and used to obtain the prediction differences or 
residuals. The information needed to uniquely specify a predictor may be 

10 sent to the decoder as part of the coded bitstream. 

Let x(i,j) be the current pixel image value to be encoded, where the 
index i represents the scan line of the current pixel and the index j 
represents the pixel location on the scan line. For example, three 
neighboring sample values, x(i-lj-l), x(i-lo) and x(ij-l) can be used to 

15 predict the value for the current sample. In this example, the prediction 

can be made as a linear combination 

xp(i,j) = a-x(i-lj-l) + b-x(i-l,j) + c-x(i,j-l), (1) 

where the real numbers a, b and c characterize the predictor. 

In a static selection scheme, the predictor coefficients are restricted 
20 to a predetermined set. For example, static predictor coefficients may be 
restricted to eight linear combinations of the three known pixel image 
values, as illustrated in Table 1 
Table 1 . Predictor Coefficients 



Predictor Index 




h 




0 


1 


0 


0 


1 


0 


1 


0 


2 


0 


0 


1 


3 


1 


1 


-1 


4 


1 


0.5 


-0.5 


5 


0.5 


1 


-0.5 


6 


-0.5 


0.5 


1 


7 


0.5 


-0.5 


1 



In Table 1, the coefficients a, b and c for any predictor index Pred_index 
have an algebraic sum of 1 (not required). In a static prediction system, 
for each input block the predictor, among a fixed number of choices, such 
as eight in Table 1 , the most suitable for the current input block is chosen, 
5 and this information is sent to the decoder as a three-bit index. 

In a dynamic selection scheme, the predictor coefficients can take 
any values and are not restricted to a fixed set of values for a given input 
block. The encoded values of the filter coefficients are also sent to the 
decoder. 

10 In the predictor module 69, because the dynamic range of the input 

pixels is known for a block, the predicted values are arranged to lie in the 
same range. For example, each of the input samples with an M-bit 
representation has a value in the range [0, 2^-1]. Depending upon the 
predictor coefficients used and the pixel values, the predicted value may 
15 lie outside the normal range of [0, 2^-1]. As a result, the prediction 
values are preferably clamped to faU within the dynamic range of the 
input pixels. 

For example, where M = 8 bits per pixel image are used to form the 
images, let a = b = 1, c = -1, x(i-l,j) = x(i,j-l) = 0 and x(i-l J-1) = 128. 
According to the prediction relation (1), the predicted value is Xp(iJ) = 
-128. Because the predicted value Xp(i,j) is less than the minimum value, 
the predicted value is clamped to the minimum value, Xj^^jj^ which is 0 

here. In a similar manner, a predicted value greater than the maximum 
value is clamped to the maximum value, x^^^x* which is 128 here. 

One clamping procedure that can be followed in this example is 
if (Xp(i,j) < xjnin) Xpajl = Xmin; 
if (Xp(i,j) > xnjax) '^pCiJ) = Xjnax- 
Another procedure, expressed in mathematical form, that achieves this 
result is 

Xp(ij)clamped = ^^^i ^^^i^min* Xp(i,j)), x^ax) (2A) 

or 

'^p^i'J^lamped = max{ minix^ax' Xp(i,j)}, Xjnin)- (2B) 
A third mathematical procedure that achieves the desired result is 



Xp(i'j)clamped = 0.5{lxp(ij) - Xminl - lxp(i,j) - Xjnax^ 
^max ■ ^min^' 

For each procedure, the values after clamping are limited to a range 

^min ^ Xp(iJ) < x^ax' (4) 
where x^^^ may be 0 and x^ax may be 2^-1, if desired. 

A difference module 71 receives a block of actual pixel values x(ij) 
and a corresponding block of predicted pixel values Xp(i,j) and forms and 
issues difference values Ax(i j) = Xp(i,j) - x(i,j). These difference values 
are received by a block encoder 73 and by a variable length code (VLC) 
table selector 75. The block encoder 73 forms and issues compressed data 
for non-dc blocks witii the information from the VLC table selector 75, 
which provides information on which VLC table should be used. A 
compressed data block module 79 concatenates all information from the 
different blocks and issues the compressed bitstream in a specific format. 

Where the following block is a dc-only block, the Pred_index 
number 0 is used. Where the following block is not dc-only, the predictor 
that gives the best cumulative results for each P x Q block is chosen. Thus, 
each P X Q block may require a different choice of Pred_index. Given a 
block to be encoded, any suitable criterion may be used to select the 
predictor coefficients. Specification of any of the eight possible predictors 
in Table 1 uses a three-bit field (for non-dc-only blocks). The number of 
predictor indices is not limited to 8 and may be any reasonable number. 

Figure IB is a block diagram illustrating an approach for 
implementing the encoder block 65 for a dc block of data. A mapper 
module 65A receives an input signal, representing the dc value for that 
block, at a first input terminal and receives a selected dc prediction value 

r 

for that block at a second input terminal. The mapper module 65A 
provides a mapper output signal that is received at a first input terminal of 
a dc/modified Golomb coding (MGC) module 65B. The dc/MGC module 
65B receives a (constant length) MGC vector K, discussed in the 
following, at a second input terminal. The MGC module 65B provides an 



8 

output bitstream of compressed (dc) block data that is received by the 
compressed data module 79 (Figure lA). 

Figure IC is a block diagram illustrating an approach for 
implementing the encoder block 73 for a non-dc block. A mapper module 
73A receives an input block of (non-constant) data x(i,j) at a first input 
terminal and receives difference data Ax(i j) for that block at a second 
input terminal. The mapper module 73A provides an output signal that is 
received at a first input terminal by a run-length modified Golomb coding 
(MGC) module 73B. The MGC module 73B receives an MGC vector K, 
discussed in the following, at a second input terminal. The MGC module 
73B provides an output bitstream of compressed (non-dc) block data that 
is received by the compressed data module 79 (Figure lA). 

The difference 

Ax(i,j) = x(ij) - xp(i J) (5) 

between a pixel image value x(i,j) and the corresponding predictor value 
Xp(ij) has a value in a range between -(2^-1) and +(2^-1). However, if 
the prediction value Xp(i,j) is known and satisfies a constraint such as (4), 
the difference value Ax can take only an eight-bit range around the 
Prediction value so that a 1-1 mapping can be constructed of the 
difference. Ax = x - Xp, onto, for example, the integer set [0, 2^-1]. 

For any predictor, the difference Ax(iJ) between the actual value 
and a predicted value of pixel content will have a maximum value and a 
minimimi value that are related by 

^max - xmin = 2^ - 1, (6) 

although the individual values for the maximum and minimum values may 
vary from one pixel to another pixel. A suitable 1-1 mapping of the 
positive integers and the negative integers in the pixel value range onto a 
single segment of the positive integers, such as [0, 51 1], is defined by 

F(0) = 0, (7 A) 

F(Ax) = 2-Ax-l (Ax = 1,2, 3,... ,2^-1), (7B) 
F(Ax) = 2-IAxl (Ax = -(2M-1), -(2^-2), ... , -2, -1). (7C) 
For the range of difference values for the quantity Ax(i,j), a 
modified mapping, F(Ax;mod), is introduced that (1) includes all 



difference values that can be reached, (2) has a range limited to at most 
2^-1 consecutive values and (3) is also 1-1. The mapping F(Ax;mod) is 
defined differently for each realistic range of the difference value Ax. For 
example, assume that M=8 and that the maximum value and minimum 
value for the difference Ax for a particular pixel are +5 and -250, 
respectively, with all difference values between -250 and +5 being 
reachable by the difference Ax. The mapping F(Ax;mod) provides the 
following sequence of correspondences for this example: 



Ax 


F(Ax:mod^ 


0 


0 


1 


1 


-1 


2 


2 


3 


-2 


4 


3 


5 


-3 


6 


4 


7 


-4 


8 


5 


9 


-5 


10 


6 


11 


-6 


12 


-7 


13 


-8 


14 


• • • 

-250 


255 



The correspondence defined by the mapping F(Ax;mod) takes 
account of the fact that the integers +6, +7, ... , +255, -255. -254, -253, 
-252, -251 cannot appear in the legitimate values for the difference Ax and 
are thus invalid integers that can be deleted in the sequence of (only) valid 
integers in this sequence. When the invalid integers Ax = +6, +7, ... , 
+255, -255, -254, -253, -252, -251 are deleted from the sequence of all 
integers in the range [-255, +255], the remaining 256 integers permit 



10 

construction of the 1-1 mapping F(Ax;mod) set forth in the preceding. The 
particular mapping F(Ax;mod) will depend upon the particular maximum 
and minimum values for the difference Ax. Other suitable 1-1 mappings 
(permutations) that achieve the same result as the mapping F(Ax;mod) can 
also be used here. The mapping F(Ax;mod) can be generalized to other 
choices of the index M as well. 

Existence of a mapping such as F(Ax;mod) can be used to calculate 
the code word index of the differential value. The following computation 
procedure is used for a choice of M = 8: 

int getVldlndex //unsigned char prediction// 

{ 

int levels, maxval, tmp; 
unsigned int index; 
int X, xh; 

levels = 256; 

maxval = levels - 1 ; 

xh = prediction; 

x = input; 

tmp = 2*xh; 

if ((x <= xh) && (tmp <= (maxval + x))){ 

index = 2*(x - xh); 

} 

else if { (x <= xh) && (tmp > maxval + x))){ 

index = maxval - x; 

} 

else if ((x > xh) && (tmp < x)){ 

index = x; 

} 

else if { (x > xh) && (tmp >= x)){ 

index = 2*(x - xh) - 1; 

{ 

return index; 
} 



11 

The procedure set forth in the preceding is illustrated in flow chart 
format in Figures 2A, 2B and 2C. The system provides or receives a block 
of binary- valued numbers, including at least one Provided Value ("Prov 
Value"), in step 101. 

In step 103, the system detemiines if the pixel values in the present 
block all have the same value so that the block is a dc block* If the answer 
to the question in 103 is "yes'\ the system chooses a selected dc predictor 
coefficient, such as Pred_index = 0 in Table 1, in step 105, and computes a 
Predicted Value ("Pred Value") for the block values, using a dc predictor 
coefficient. From step 105, the system moves to step 109, discussed in the 
following. If the answer to the question in 103 is "no", the system moves 
to step 107, where it chooses a second Pred_index and a selected set of 
non-dc predictor coefficients and computes at least one Predicted Value in 
the block, using the non-dc predictor coefficients. 

In step 109 (optional), the system limits each predicted binary value 
to a selected range of values, with a max value. In step 111, the system 
computes a Product Value ("Prod Value"), equal to twice the Predicted 
Value. In step 113, the system computes a Difference Value, equal to the 
difference between the Provided Value and the corresponding Predicted 
Value. 

In step 115, the system determines if both of (1) Provided Value < 
Predicted Value and (2) Product Value < Provided Value + max value are 
satisfied. If the answer to the compound question in 1 15 is "yes", the 
system moves to step 117 and computes a Value Index, equal to twice the 
Difference Value, and moves to step 131, discussed in the following. 

If the answer to the question in 1 15 is "no", the system moves to 
step 119 and determines if^both of (1) Provided Value < Predicted Value 
and (2) Product Value > Provided Value -f max value are satisfied. If the 
answer to the compound question in 119 is "yes", the system moves to step 
121 and computes a Value Index, equal to max value minus the Provided 
Value, and moves to step 131. 
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If the answer to the question in 1 19 is "no", the system moves to 
step 123 and detemiines if each of two conditions, (1) Provided Value > 
Predicted Value and (2) Product Value < Provided Value, is satisfied. If 
the answer to the compound question m 123 is "yes", the system moves to 
5 step 125 and computes a Value Index, equal to the Provided Value, and 
moves to step 131. 

If the answer to the question in 123 is "no", the system moves to 
step 127 and determines if each of two conditions, (1) Provided Value > 
Predicted Value and (2) Product Value > Provided Value, is satisfied. If 
10 the answer to the compound question in 127 is "yes", the system moves to 
step 129 and computes a Value Index, equal to twice the Difference Value 
minus 1, and moves to step 131. In step 131, the system uses the computed 
Value Index to represent at least one number in the block. 

Except for the boundary cases, where a pixel is located on an edge 
15 of a P X Q block, the positive difference values are scaled by a factor of 2, 
and then reduced by 1; the negative differential values are first negated (so 
that they become positive) and then multiphed by a factor of 2. The 
number 1 is subtracted from each scaled positive value (without negation) 
so that the result is odd-valued and can be distinguished from the scaled 
20 negative values, which correspond to even-values integers. For example, 
let x(i,j) = 128 and the Prediction Value =130. Because the current image 
value x(ij) is less than 130 and tmp (= 2*130) is less than (255+128), the 
index is equal to 2-(130-128) = 4. 

The residual or difference values Ax can be efficiently represented 
25 by entropy coding methods that use variable code lengths for each words. 
In the present coding scheme, a variable-length code word corresponding 
to the index computed as above is sent to the decoder as a part of the coded 
bitstream. The specific variable length code used here is a modified 
Golomb coding (MGC) technique as outlined in the following. 
30 Entropy coders based on Huffman coding have been widely used in 

the context of image and video coding. Although entropy coding provides 
a satisfactory result without requiring that extensive context analysis be 
performed, this approach requires provision of large tables for lookup 
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operations and offers little flexibility for on-line adaptation. To address 
the lookup table size requirements, a Golomb coding method could be 
used, as set forth by S. Golomb in "Run-length encodings", LE.E.E. 
Trans, on Information Theory, vol. IT-12, 1966, pp. 399-401. However, 
5 Golomb codes can only be optimal for certain probability distributions. To 
alleviate this, an MGC technique was developed in a prior patent 
application, "Memory Efficient Variable Length Encoding and Decoding 
System", U.S. Serial No. 09/xxx,xxx. The MGC scheme uses a small table 
for adaptability and requires no table for actual decoding of the symbols. 
10 This scheme has the ability to optimally adapt to the varying statistical 
characteristics of the data, while requiring no tables for decoding of the 
symbols. The details are given in this patent application. 

The basic Golomb code is usually specified by a parameter m, 
preferably an integer greater than 1, for which a positive fraction p 
15 satisfies p™ = 0.5 (0.5 < p < 1) . For any non-negative integer n, a 
variable-length code includes two parts: a variable length part, 
representing the integer portion of a ratio n/m, written [n/m] j^t, and a 

fixed-length part, n modulo m. To make the implementation simpler, m is 
often restricted to be a power of 2 (m = 2^ with k > 1), although this 
20 requirement can be relaxed. The fixed-length part (n mod m) is 

represented using k bits, where k is a selected integer satisfying 2^'^ < m 
< 2^,. The variable portion of the ratio n/m, written n' = [n/m]i^^, is 

represented using a ran of n* zeroes, followed by a 1. As a result, the bit 
length of any code word n is given by 

25 bitLength(n) = [n/mjjnt + k -i- 1 , (8) 

where k (= log2(m)) is the length of the fixed length portion of the 
representation. Although die implementation is simple, Golomb coding is 
not optimal for most probability distributions. The optimality is further 
compromised for values of m that are powers of two. The basic Golomb 

30 coding is improved by introducing the following modification. 

Let D be the set of all possible non-negative data fields. Divide D 
into N bins, 

D = BO u Bl u . . o B(N-l), (9) 
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where bin Bi has a selected number Mi of elements (Mi > 1). First assume 
that the number Mi is a power of two, expressed as 

Mi = 2k(i). (10) 
If a data field (element) belongs to the bin Bi, the index within the bin can 
be uniquely identified using k(i) bits, where the bit length is 

bitLength(Bi) = i + 1 + k(i). (11) 
A bin Bi is identified using runs of i zeroes followed by 1. Hence, A bin 
Bi can be represented using i+1 bits. This code can be uniquely specified 
by the vector K (given D) which is given by 

K = {k(0), k(l), k(N-l)}. (12) 
To encode an input sample n, given the vector K, the following procedure 
can be followed: 

prevlndex = 0; 

index = 1 « k(0); 

bin = 0; 

while (n >= index) 

{ . 
bin = bin + 1 ; 
prevlndex = index; 
index = index + (1 « k(bin)); 

} 

encodeBin (bin); 

encodeOffset(n - prevfodex); 
where encodeBin is a procedure for encoding the variable length portion 
and encodeOffset is a procedure for encoding the fixed-length portion. 

One can easily verify that by changing the vector K, one will be able 
to adapt the variable-length codes according to the underlying probability 
distribution. This is the method used in this image coder. 

For example, the coder may use the following 16-entry table for 
varying values of K: 



15 



10 



15 



Table 2. K-Vector ComDonents. 


Index 


ik(0\ km. k(2Y k(3\ k(4)] 


0 


{6,6,6,6,6} 


1 


{5, 5, 5, 5, 5} 


2 


{4, 4, 4, 4, 4} 


3 


{3, 3, 3, 3,3} 


4 


{2, 2. 2, 2, 2} 


5 


{1, 1, 1, 1, 1} 


6 


{2,1,1,1,2} 


7 


{0,1,1,1,1} 


8 


{5, 6, 6, 6, 6} 


9 


{4, 5, 5, 5, 5} 


10 


{3, 4, 4. 4, 4} 


11 


{2, 3, 3,3,3} 


12 


{2, 2, 1, 1,2} 


13 


{2,3,3,4,4} 


14 


{0, 1, 2, 2, 3} • 


15 


{0,1,2,2,2} 


For code words with a bit-length greater than 17 



20 technique is used. That is, the code word for ESCAPE is used, followed by 
an M-bit index representing the differential value between the input and 
the prediction. One constraint used here is k(i) = k(4) for i > 4. The 
quantity k(8) is preferably used for ESCAPE. While better choices may 
be available, this was chosen. One example of a variable-length code for 

25 ESCAPE is '000000001'. Another is '00000001'. In such an instance, all 
code words with more than 16 bit lengths greater than 16 are coded using 
an Escape coding technique. 

Several different MGC coding approaches can be used here, each 
with a different set of K-vector components, and a table need not have 16 

30 entries. A suitable alternative to Table 2 is shown in Table 3, in which two 
K-vectors are changed relative to Table 2. 



Table 3, K- Vector Components , 



Index 


^k(O). k(lY k(2Y k(3Y k(4)] 


0 


{6, 6, 6, 6, 6} 


1 


{5, 5, 5, 5, 5} 


2 


{4,4,4,4,4} 


3 


{3,3, 3, 3,3) 


4 


{2, 2, 2, 2, 2} 


5 


{1,1,1,1,1} 


6 


{7, 7, 7, 7, 7} 


7 


{0,1,1,1,1} 


8 


{5,6,6,6,6} 


9 


{4, 5, 5, 5, 5} 


10 


{3, 4, 4. 4, 4} 


11 


{2, 3, 3, 3, 3} 


12 


{2, 2, 1, 1, 2} 


13 


{2, 3, 3, 4, 4} 


14 


{0,1,2,2,3} 


15 


{8,8,8,8,8} 



The following example, with the choice M = 8, may clarify the 
20 procedure. The coding includes the prescription 
x(i,j) = 255. Prediction = 20 
differential value = 235 
vld index = 255; 

Using the Modified Golomb Code (1,1,1,1,1), the number of elements in 
25 different bins is given by {2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2,... }, and the code word-lengths corresponding to the elements in each 
bin is given by {2, 3, 4, 5, 6, 7, 8, 9, ESCAPE, 11, 12, 13, 14, 15, 16, 17, 
18,....}. If raw coding is used, the vld-index 255 belongs to bin 127, and 
needs a total bit-length of 127-1-1+1 (to offset for ESCAPE) -i- 1 (to 
30 uniquely represent the element in each bin) for a total of 130 bits. 

However, with ESCAPE coding, a nine-bit code to represent an ESCAPE 
code word, followed by an 8-bit code word to represent the vld-index 255, 
is sent (code word '00000000111111111'). 



Because the pixel values are differentially coded, the vld index value 
0 is the most frequently occurring index value within a block. The zero- 
value index can be efficiently coded using the MGC table with k(0) = 0. 
For typicd images, consecutive zero-values indices, or zefo-nms, are also 
5 common. For example, for a graphics image with block size PxQ = 8x8, 
some zero-run statistics are shown in Figure 3. It is clear that zero-runs of 
length 1 and length 8 are the most frequently occurring pattems in these 
images. For a more general block size, PxQ, the most frequently 
occurring pattems would be runs of lengths 1 and Q. 

10 To handle zero-runs in a manner similar to treatment of the number 

of vertical pixels in a PxQ block efficiently, zero-rans of length Q were 
checked only at row boundaries. For example, for an input block of 8x8 
pixels, at most 8 zero-runs of length 8 are possible. If one or niore zero- 
run of length Q is present, a bit is set to indicate the presence of this 

15 condition, followed by a P-bit codeword to specify which row of the PxQ 
block has the zero-run. For rows that do not contain a zero-mn of length 
Q, the differential data are encoded using the MGC technique mentioned in 
the preceding. The c-code for this procedure is set forth in the foUowing. 
Icoimt = 0; 

20 for (i=0; i<P; i++) { /* for each block line*/ 

rowz =0; 

for (j=0; j<Q; j++) { 
rowz += (dii!f[i*Q=j] =0); 

■•.}.■ r 

25 ident[i] = (rowz==Q); 

Icount += (rowzj==Q); 

) ^ 
if Ocount=0) { 

putbits(l,l); 

30 for (i=0; i<P; i++)putbits(l,ident[i]); 

} 

else { 

putbits(l,0); 



18 

} 

The present invention has been compared with other lossless coders 
for encoding monochrome images in the JPEG-2000 image set, and the 
results are set forth in Table 4. LOCO and TCQ are two leading methods 
for image compression that achieve very high compression ratios. The 
LOCO method is discussed by M. Weinberger et al, ibid. The TCQ 
method is discussed by F. Sheng et al, ibid. The GZIP method is discussed 
By J.L. Gaily and M. Adler in "GZIP Documentation and Sources", 
ftp://prep.ai.mit.edu/pub/gnv. 

All the images were monochrome quality, using an eight-bit 
representation for each pixel. The GZIP method, widely used in computer 
processing and based on the Lempel Ziv scheme, and its extensions were 
also applied to the set of images. The last column in Table 4, BALI, sets 
forth the results of the invention, for comparison. The BALI scheme 
provides compression ratios that are close to the LOCO and TCQ 
approaches for most images; and for one representative image, a target, 
BALI provides substantially greater compression ratios. 
Table 4. Bits/Pixel Required for Images In JPEG-200Q Set . 



Image 


LOCO 


TCQ 


GZIP 


BALI 


woman 


4.45 


4.39 


6.13 


4.52 


cafe 


5.09 


5.19 


6.73 


5.13 


bicycle 


4.36 


4.41 


5.73 


4.34 


target 


2.19 


2.13 


1.08 


1.56 


aerial2 


5.29 


5.24 


6.01 


5.39 



Another advantage of the invention arises from the possibility of 
parallel processing. Processing of data in blocks, use of the particular 
prediction schemes discussed herein and use of modified Golomb coding 
allows the system to process two or more blocks simultaneously, without 
requiring that each pixel value be processed seriatim and in a fixed order. 
This parallel processing approach provides a significant advantage in time 
required, in seconds, to compress and to decompress a given image. 



19 

Coding of a given block of data according to the invention does not 
depend upon how any preceding block was coded. Thus, an image can be 
decomposed into an ordered sequence {Rjlj (i = 0, 1, 2, ..„ L-1) of two 

or more regions, as illustrated in one approach in Figure 4, and the 
regions can be coded in parallel. The only condition imposed on 
partitioning the image into two or more regions is that all blocks assigned 
to a region Rj should occur after all blocks in a preceding region assigned 
to a region Rj.j (j>l) in the sequence, when a scan is perfoimed in a left- 
to-right and top-to-bottom scanning order (referred to herein as a 
scanning order pattern). 

One approach for performing this decomposition into regions is to 
split the image into multiple horizontal rows of blocks, where the 
individual regions Rj can be coded simultaneously using multiprocessing 

software or specialized hardware. Because the coding is performed in 
parallel, the processing of each block can be performed at a lower clock 
frequency, with lower power requirements, while maintaining high 
throughput. After the parallel coding is completed, the L processed (or 
compressed) bitstreams are merged in a straightforward operation, as 
illustrated in Figure 5. 

A second level of parallelism is available by splitting an encoder 
into two sets of operations: (1) predictor selection and MGC index 
selection and (2) entropy coding. Each of these operations can be pipelined 
to further improve the computation performance. 

A third level of parallelism takes advantage of the fact that predictor 
selection and MGC selection involve similarly structured computations. 
Predictor selection, for example, requires that the residuals for different 
predictor choices be computed, using a conunon metric. Calculation of the 
metrics for different predictors can be done in parallel. A choice of the 
predictor with smallest residual is straightforward, after a metric is 
chosen. 

Figure 6 illustrates an implementation of this level of parallelism. A 
block of pixel values is received at H different predictor analyzers, 
numbered h = 0, 1, ... , H-I, in a first step 151. In step 153, each predictor 
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analyzer computes a corresponding prediction error residual values for all 
the pixels in the block. The errors for each of the H analyzers are pooled 
and compared in step 155, using an appropriate error metric, such as sum 
of squares, or sum of absolute values, of the errors. The predictor with 
the lowest metric (including application of a tie breaker algorithm, if 
necessary) is chosen for the next stage or block, as part of step 155. Each 
of the predictor analyzers can be operated independently and in parallel to 
maximize system performance and/or to minimize the time required for 
preparing a compressed image. 

The predictor calculations themselves indicate some degree of 
parallelism, with similar computations being performed for each pixel in 
the image. This parallelism can be exploited in a single instruction, 
multiple data (SIMD) approach as a sequence of software instractions or as 
a special hardware block. Use of SIMD instractions has become 
commonplace in microprocessors today; for example VIS on SPARC, 
MMX and SSE on x86, Altivec on PowerPC, MAX on PARISC, and 
3DNow! on AMD, and on digital signal processors. 

Figure 7 shows a block diagram of a general computer system 200, 
which may be used to implement various hardware components of the 
invention, such as a client an applications server and a database 
management system. The computer system 200 includes a bus 208 or other 
communication mechanism for communicating information and a 
processor 210, coupled with the bus 208, for processing information. The 
computer system 200 also includes a main memory 212, such as a random 
access memory (RAM) or other dynamic storage device, coupled to the 
bus 208, for storing information and instructions to be executed by the 
processor 210. The main memory 212 also may be used for storing 
temporary variables or other intermediate information during execution 
of instractions by the processor 210. The computer system 200 further 
optionally includes read only memory (ROM) 214 or other static storage 
device, coupled to the bus 208, for storing static information and 
instractions for the processor 210. A storage device 216, such as a 



magnetic disk or optical disk, is provided and is coupled to the bus 208 for 
storing information and instructions. 

The computer system 200 may also be coupled through the bus to a 
display 218, such as a cathode ray tube (CRT), for displaying information 
5 to a computer user. An input device 220, including alphanumeric and 
other keys, is coupled to the bus for commimicating information and 
commands to the processor 210. Another type of user input device is a 
cursor control 222, such as a mouse, a trackball or cursor direction keys 
for communicating direction information and command selections to the 
10 processor 210 and for controlling cursor movement on the display 218. 
This input device typically has one degree of freedom in each of two axes, 
such as X- and y-axes, that allows the device to specify locations in a plane. 

The functionality of the invention is provided by the computer 
system 200 in response to the processor 210 executing one or more 
15 sequences of instructions contained in main memory 212. These 

instractions may be read into main memory 212 from another computer- 
readable medium, such as a storage device 216. Execution of the sequences 
of instructions contained in the main memory 212 causes the processor 
210 to perform the process steps described herein. In alternative 
20 embodiments, hard-wired circuitry may be used in place of, or in 
combination with, software instractions to implement the invention. 
Embodiments of the invention are not limited to any specific combination 
of hard-wired circuitry and software. 

The term "computer-readable medium", as used herein, refers to 
25 any medium that participates in providing instructions to the processor 
210 for execution. This medium may take many forms, including but not 
limited to non-volatile media, volatile media and transmission media. Non- 
volatile media includes, for example, optical and magnetic disks, such as 
the storage disks 216. Volatile media includes dynamic memory 212. 
30 Transmission media includes coaxial cables, copper wire and fiber optics 
and includes the wires that are part of the bus 208. Transmission media 
can also take the form of acoustic or electromagnetic waves, such as those 
generated during radiowave, infrared and optical data communications. 
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Common fomis of computer-readable media include, for example, a 
floppy disk, a flexible disk, a hard disk, magnetic tape or any other 
magnetic medium, a CD-ROM, any other optical medium, punchcards, 
papertape, any other physical medium with patterns of holes or apertures, 
5 a RAM, a ROM, a PROM, an EPROM, a Flash-EPROM, any other 

memory chip or cartridge, a carrier wave as described hereinafter, or any 
other medium from which a computer can be read. 

Various forms of computer-readable media may be involved in 
carrying out one or more sequences of one or more instructions to the 

10 processor 210 for execution. For example, the instructions may initially be 
carried on a magnetic disk of a remote computer. The remote computer 
can load the instructions into its dynamic memory and send the 
instructions over a telephone, using a modem. A modem local to the 
computer system 200 can receive data over a telephone line and use 

15 infrared transmitter to convert and transmit the data to the an infrared 

detector connected to the computer system bus. The bus will carry the data 
to the main memory 212, from which the processor receives and executes 
the instructions. Optionally, the instructions receive by the main memory 
212 can be stored on the storage device 216, either before or after 

20 execution by the processor 210. 

The computer system 200 also includes a commimications interface 
224, coupled to the bus 208, which provides two-way data communication 
coupling to a network link 226 that is coimected to a local area network 
(LAN) or to a wide area network (WAN). For example, the 

25 communications interface 224 may be an integrated services digital 
network (ISDN) card or a modem to provide a data communication 
connection to a corresponcfing type of telephone line. As another example, 
the communications interface 224 may be a local area network card to 
provide a data communication connection to a compatible LAN. Wireless 

30 links may also be implemented. In any such implementation, the 
communications interface 224 sends and receives electrical, 
electromagnetic or optical signals that carry digital data streams 
representing various types of information. 
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The network link 226 typically provides data communication 
through one or more networks to other data devices. For example, the 
data link 226 may provide a connection through an LAN 228 to a host 
computer 230 or to data equipment operated by an Internet Service 
5 Provider (ISP) 232. The ISP, in turn, provides data communication 

services through the world wide packet data communication networic, now 
commonly known as the "Internet" 234, served by one or more servers 
236. The LAN 228 and the Internet 234 both use electrical, 
electromagnetic and/or optical signals to carry the digital data streams. 
10 The signals carried by these network, the signals carried on the network 
link 226 and the signals carried on the communications interface 224, are 
examples of carrier waves that transport the information. 



